Overview

Dataset statistics

Number of variables17
Number of observations888228
Missing cells3452445
Missing cells (%)22.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory579.2 MiB
Average record size in memory683.7 B

Variable types

CAT11
NUM6

Reproduction

Analysis started2020-02-26 02:53:54.218969
Analysis finished2020-02-26 05:05:20.829865
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
model has a high cardinality: 954 distinct values High cardinality
stk_year has a high cardinality: 113 distinct values High cardinality
date_created has a high cardinality: 888228 distinct values High cardinality
date_last_seen has a high cardinality: 839005 distinct values High cardinality
maker has 129461 (14.6%) missing values Missing
model has 283265 (31.9%) missing values Missing
mileage has 90469 (10.2%) missing values Missing
manufacture_year has 92350 (10.4%) missing values Missing
engine_displacement has 185795 (20.9%) missing values Missing
engine_power has 138657 (15.6%) missing values Missing
body_type has 280220 (31.5%) missing values Missing
color_slug has 835604 (94.1%) missing values Missing
stk_year has 427803 (48.2%) missing values Missing
transmission has 185429 (20.9%) missing values Missing
door_count has 153884 (17.3%) missing values Missing
seat_count has 187285 (21.1%) missing values Missing
fuel_type has 462223 (52.0%) missing values Missing
price_eur is highly skewed (γ1 = 942.4582842) Skewed
date_created only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
date_last_seen only contains datetime values, but is categorical. Consider applying pd.to_datetime()Type
mileage has 40475 (4.6%) zeros Zeros

Variables

df_index
Real number (ℝ≥0)

UNIQUE
Distinct count888228
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1778744.5171070942
Minimum1
Maximum3552909
Zeros0
Zeros (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile178409.05
Q1889836.75
median1779182
Q32667137.5
95-th percentile3377288.95
Maximum3552909
Range3552908
Interquartile range (IQR)1777300.75

Descriptive statistics

Standard deviation1026197.063
Coefficient of variation (CV)0.5769221228
Kurtosis-1.200018129
Mean1778744.517
Median Absolute Deviation (MAD)888793.3012
Skewness-0.002733294476
Sum1.579930685e+12
Variance1.053080412e+12
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.000000e+00 3.552909e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
3474068 1 < 0.1%
 
1571751 1 < 0.1%
 
1565604 1 < 0.1%
 
1457055 1 < 0.1%
 
3548060 1 < 0.1%
 
1446810 1 < 0.1%
 
3541913 1 < 0.1%
 
1442712 1 < 0.1%
 
2688262 1 < 0.1%
 
Other values (888218) 888218 > 99.9%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
3 1 < 0.1%
 
7 1 < 0.1%
 
10 1 < 0.1%
 
19 1 < 0.1%
 
ValueCountFrequency (%) 
3552909 1 < 0.1%
 
3552905 1 < 0.1%
 
3552902 1 < 0.1%
 
3552900 1 < 0.1%
 
3552892 1 < 0.1%
 

maker
Categorical

MISSING
Distinct count46
Unique (%)< 0.1%
Missing129461
Missing (%)14.6%
Memory size6.8 MiB
skoda
 
78681
volkswagen
 
74582
bmw
 
66795
mercedes-benz
 
63040
audi
 
61721
Other values (41)
413948
ValueCountFrequency (%) 
skoda 78681 8.9%
 
volkswagen 74582 8.4%
 
bmw 66795 7.5%
 
mercedes-benz 63040 7.1%
 
audi 61721 6.9%
 
ford 59858 6.7%
 
opel 54218 6.1%
 
fiat 32957 3.7%
 
citroen 30514 3.4%
 
renault 26505 3.0%
 
Other values (36) 209896 23.6%
 
(Missing) 129461 14.6%
 

Length

Max length13
Mean length5.630189546
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 25 96.2%
 
Dash_Punctuation 1 3.8%
 
ValueCountFrequency (%) 
Latin 25 96.2%
 
Common 1 3.8%
 
ValueCountFrequency (%) 
ASCII 26 100.0%
 

model
Categorical

HIGH CARDINALITY
MISSING
Distinct count954
Unique (%)0.2%
Missing283265
Missing (%)31.9%
Memory size6.8 MiB
octavia
 
32476
fabia
 
23024
golf
 
22855
focus
 
15124
astra
 
14434
Other values (949)
497050
ValueCountFrequency (%) 
octavia 32476 3.7%
 
fabia 23024 2.6%
 
golf 22855 2.6%
 
focus 15124 1.7%
 
astra 14434 1.6%
 
passat 12775 1.4%
 
a3 12645 1.4%
 
corsa 11619 1.3%
 
fiesta 8794 1.0%
 
polo 8244 0.9%
 
Other values (944) 442973 49.9%
 
(Missing) 283265 31.9%
 

Length

Max length23
Mean length4.41355035
Min length1
ValueCountFrequency (%) 
Lowercase_Letter 26 70.3%
 
Decimal_Number 10 27.0%
 
Dash_Punctuation 1 2.7%
 
ValueCountFrequency (%) 
Latin 26 70.3%
 
Common 11 29.7%
 
ValueCountFrequency (%) 
ASCII 37 100.0%
 

mileage
Real number (ℝ≥0)

MISSING
ZEROS
Distinct count144550
Unique (%)18.1%
Missing90469
Missing (%)10.2%
Infinite0
Infinite (%)0.0%
Mean115906.30837759274
Minimum0.0
Maximum9999999.0
Zeros40475
Zeros (%)4.6%
Memory size6.8 MiB

Quantile statistics

Minimum0
5-th percentile0
Q118742
median86376
Q3158408.5
95-th percentile255000
Maximum9999999
Range9999999
Interquartile range (IQR)139666.5

Descriptive statistics

Standard deviation344693.7095
Coefficient of variation (CV)2.973899474
Kurtosis433.8286873
Mean115906.3084
Median Absolute Deviation (MAD)92146.77003
Skewness19.49107601
Sum9.246530066e+10
Variance1.188137534e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0 40475 4.6%
 
10 26831 3.0%
 
1 8693 1.0%
 
100 6533 0.7%
 
5 5315 0.6%
 
150000 3556 0.4%
 
200000 3063 0.3%
 
15 3061 0.3%
 
160000 2867 0.3%
 
170000 2755 0.3%
 
Other values (144540) 694610 78.2%
 
(Missing) 90469 10.2%
 
ValueCountFrequency (%) 
0 40475 4.6%
 
1 8693 1.0%
 
2 1826 0.2%
 
3 663 0.1%
 
4 414 < 0.1%
 
ValueCountFrequency (%) 
9999999 38 < 0.1%
 
9996083 2 < 0.1%
 
9991981 1 < 0.1%
 
9983000 1 < 0.1%
 
9981655 1 < 0.1%
 

manufacture_year
Real number (ℝ≥0)

MISSING
Distinct count1117
Unique (%)0.1%
Missing92350
Missing (%)10.4%
Infinite0
Infinite (%)0.0%
Mean2000.8088727166726
Minimum0.0
Maximum2017.0
Zeros29
Zeros (%)< 0.1%
Memory size6.8 MiB

Quantile statistics

Minimum0
5-th percentile1997
Q12004
median2009
Q32013
95-th percentile2015
Maximum2017
Range2017
Interquartile range (IQR)9

Descriptive statistics

Standard deviation82.39924251
Coefficient of variation (CV)0.04118296537
Kurtosis243.7447351
Mean2000.808873
Median Absolute Deviation (MAD)15.6599588
Skewness-14.24606305
Sum1592399764
Variance6789.635166
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2015 109960 12.4%
 
2012 61326 6.9%
 
2011 54899 6.2%
 
2014 50621 5.7%
 
2013 41367 4.7%
 
2007 39644 4.5%
 
2010 39388 4.4%
 
2008 38751 4.4%
 
2006 38594 4.3%
 
2009 36391 4.1%
 
Other values (1107) 284937 32.1%
 
(Missing) 92350 10.4%
 
ValueCountFrequency (%) 
0 29 < 0.1%
 
1 3 < 0.1%
 
2 1 < 0.1%
 
4 1 < 0.1%
 
6 1 < 0.1%
 
ValueCountFrequency (%) 
2017 2764 0.3%
 
2016 31033 3.5%
 
2015 109960 12.4%
 
2014 50621 5.7%
 
2013 41367 4.7%
 

engine_displacement
Real number (ℝ≥0)

MISSING
Distinct count4456
Unique (%)0.6%
Missing185795
Missing (%)20.9%
Infinite0
Infinite (%)0.0%
Mean2042.765530662711
Minimum0.0
Maximum32000.0
Zeros1
Zeros (%)< 0.1%
Memory size6.8 MiB

Quantile statistics

Minimum0
5-th percentile1000
Q11400
median1798
Q31997
95-th percentile3189
Maximum32000
Range32000
Interquartile range (IQR)597

Descriptive statistics

Standard deviation1961.148841
Coefficient of variation (CV)0.9600459826
Kurtosis120.9708446
Mean2042.765531
Median Absolute Deviation (MAD)695.0110645
Skewness9.837364222
Sum1434905920
Variance3846104.777
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1968 53735 6.0%
 
1598 52525 5.9%
 
1995 28161 3.2%
 
1560 19442 2.2%
 
1197 19042 2.1%
 
1900 18096 2.0%
 
2000 17310 1.9%
 
1896 17126 1.9%
 
1390 16321 1.8%
 
1997 15367 1.7%
 
Other values (4446) 445308 50.1%
 
(Missing) 185795 20.9%
 
ValueCountFrequency (%) 
0 1 < 0.1%
 
10 17 < 0.1%
 
12 4 < 0.1%
 
13 3 < 0.1%
 
14 3 < 0.1%
 
ValueCountFrequency (%) 
32000 200 < 0.1%
 
31999 1 < 0.1%
 
31987 1 < 0.1%
 
31968 5 < 0.1%
 
31966 2 < 0.1%
 

engine_power
Real number (ℝ≥0)

MISSING
Distinct count532
Unique (%)0.1%
Missing138657
Missing (%)15.6%
Infinite0
Infinite (%)0.0%
Mean98.44853789701043
Minimum1.0
Maximum999.0
Zeros0
Zeros (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum1
5-th percentile48
Q168
median86
Q3110
95-th percentile184
Maximum999
Range998
Interquartile range (IQR)42

Descriptive statistics

Standard deviation49.01260876
Coefficient of variation (CV)0.4978500423
Kurtosis14.79790505
Mean98.4485379
Median Absolute Deviation (MAD)32.98080479
Skewness2.762575673
Sum73794169
Variance2402.235817
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
103 41665 4.7%
 
110 40716 4.6%
 
77 38068 4.3%
 
66 36566 4.1%
 
55 31407 3.5%
 
81 30445 3.4%
 
85 28937 3.3%
 
74 23003 2.6%
 
125 20719 2.3%
 
100 19294 2.2%
 
Other values (522) 438751 49.4%
 
(Missing) 138657 15.6%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
2 2 < 0.1%
 
3 7 < 0.1%
 
4 2 < 0.1%
 
6 2 < 0.1%
 
ValueCountFrequency (%) 
999 4 < 0.1%
 
998 2 < 0.1%
 
997 3 < 0.1%
 
995 1 < 0.1%
 
968 2 < 0.1%
 

body_type
Categorical

MISSING
Distinct count9
Unique (%)< 0.1%
Missing280220
Missing (%)31.5%
Memory size6.8 MiB
other
491585
compact
 
60348
coupe
 
17858
stationwagon
 
17517
van
 
7813
Other values (4)
 
12887
ValueCountFrequency (%) 
other 491585 55.3%
 
compact 60348 6.8%
 
coupe 17858 2.0%
 
stationwagon 17517 2.0%
 
van 7813 0.9%
 
offroad 5683 0.6%
 
sedan 4856 0.5%
 
convertible 1291 0.1%
 
transporter 1057 0.1%
 
(Missing) 280220 31.5%
 

Length

Max length12
Mean length4.654033649
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 20 100.0%
 
ValueCountFrequency (%) 
Latin 20 100.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

color_slug
Categorical

MISSING
Distinct count14
Unique (%)< 0.1%
Missing835604
Missing (%)94.1%
Memory size6.8 MiB
black
10680
white
10316
blue
9437
silver
8161
red
5001
Other values (9)
9029
ValueCountFrequency (%) 
black 10680 1.2%
 
white 10316 1.2%
 
blue 9437 1.1%
 
silver 8161 0.9%
 
red 5001 0.6%
 
green 2319 0.3%
 
brown 2252 0.3%
 
grey 1634 0.2%
 
beige 1052 0.1%
 
yellow 571 0.1%
 
Other values (4) 1201 0.1%
 
(Missing) 835604 94.1%
 

Length

Max length6
Mean length3.104806424
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 20 100.0%
 
ValueCountFrequency (%) 
Latin 20 100.0%
 
ValueCountFrequency (%) 
ASCII 20 100.0%
 

stk_year
Categorical

HIGH CARDINALITY
MISSING
Distinct count113
Unique (%)< 0.1%
Missing427803
Missing (%)48.2%
Memory size6.8 MiB
None
326432
2018
 
45875
2017
 
45220
2016
 
31129
2019
 
11105
Other values (108)
 
664
ValueCountFrequency (%) 
None 326432 36.8%
 
2018 45875 5.2%
 
2017 45220 5.1%
 
2016 31129 3.5%
 
2019 11105 1.3%
 
2015 212 < 0.1%
 
2020 209 < 0.1%
 
2021 26 < 0.1%
 
3000 18 < 0.1%
 
2500 16 < 0.1%
 
Other values (103) 183 < 0.1%
 
(Missing) 427803 48.2%
 

Length

Max length4
Mean length3.518363528
Min length3
ValueCountFrequency (%) 
Decimal_Number 10 66.7%
 
Lowercase_Letter 4 26.7%
 
Uppercase_Letter 1 6.7%
 
ValueCountFrequency (%) 
Common 10 66.7%
 
Latin 5 33.3%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

transmission
Categorical

MISSING
Distinct count2
Unique (%)< 0.1%
Missing185429
Missing (%)20.9%
Memory size6.8 MiB
man
505358
auto
197441
ValueCountFrequency (%) 
man 505358 56.9%
 
auto 197441 22.2%
 
(Missing) 185429 20.9%
 

Length

Max length4
Mean length3.222286395
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 6 100.0%
 
ValueCountFrequency (%) 
Latin 6 100.0%
 
ValueCountFrequency (%) 
ASCII 6 100.0%
 

door_count
Categorical

MISSING
Distinct count14
Unique (%)< 0.1%
Missing153884
Missing (%)17.3%
Memory size6.8 MiB
4
282207
5
224029
None
118487
2
76926
3
 
30186
Other values (9)
 
2509
ValueCountFrequency (%) 
4 282207 31.8%
 
5 224029 25.2%
 
None 118487 13.3%
 
2 76926 8.7%
 
3 30186 3.4%
 
0 2083 0.2%
 
6 324 < 0.1%
 
1 84 < 0.1%
 
7 10 < 0.1%
 
55 4 < 0.1%
 
Other values (4) 4 < 0.1%
 
(Missing) 153884 17.3%
 

Length

Max length4
Mean length1.746695668
Min length1
ValueCountFrequency (%) 
Decimal_Number 9 64.3%
 
Lowercase_Letter 4 28.6%
 
Uppercase_Letter 1 7.1%
 
ValueCountFrequency (%) 
Common 9 64.3%
 
Latin 5 35.7%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

seat_count
Categorical

MISSING
Distinct count42
Unique (%)< 0.1%
Missing187285
Missing (%)21.1%
Memory size6.8 MiB
5
442656
None
134217
4
 
61116
7
 
24875
2
 
18086
Other values (37)
 
19993
ValueCountFrequency (%) 
5 442656 49.8%
 
None 134217 15.1%
 
4 61116 6.9%
 
7 24875 2.8%
 
2 18086 2.0%
 
3 8312 0.9%
 
6 3574 0.4%
 
9 3121 0.4%
 
0 3027 0.3%
 
8 1730 0.2%
 
Other values (32) 229 < 0.1%
 
(Missing) 187285 21.1%
 

Length

Max length4
Mean length1.875124405
Min length1
ValueCountFrequency (%) 
Decimal_Number 10 66.7%
 
Lowercase_Letter 4 26.7%
 
Uppercase_Letter 1 6.7%
 
ValueCountFrequency (%) 
Common 10 66.7%
 
Latin 5 33.3%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

fuel_type
Categorical

MISSING
Distinct count5
Unique (%)< 0.1%
Missing462223
Missing (%)52.0%
Memory size6.8 MiB
gasoline
225760
diesel
191467
electric
 
6653
lpg
 
1844
cng
 
281
ValueCountFrequency (%) 
gasoline 225760 25.4%
 
diesel 191467 21.6%
 
electric 6653 0.7%
 
lpg 1844 0.2%
 
cng 281 < 0.1%
 
(Missing) 462223 52.0%
 

Length

Max length8
Mean length4.954977776
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 13 100.0%
 
ValueCountFrequency (%) 
Latin 13 100.0%
 
ValueCountFrequency (%) 
ASCII 13 100.0%
 

date_created
Categorical

HIGH CARDINALITY
TYPE DATE
UNIFORM
UNIQUE
Distinct count888228
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.8 MiB
2016-01-18 07:17:32.222776+00
 
1
2016-09-24 18:10:16.125849+00
 
1
2015-12-10 10:22:18.042673+00
 
1
2016-06-24 18:16:55.069692+00
 
1
2016-02-18 04:43:19.77417+00
 
1
Other values (888223)
888223
ValueCountFrequency (%) 
2016-01-18 07:17:32.222776+00 1 < 0.1%
 
2016-09-24 18:10:16.125849+00 1 < 0.1%
 
2015-12-10 10:22:18.042673+00 1 < 0.1%
 
2016-06-24 18:16:55.069692+00 1 < 0.1%
 
2016-02-18 04:43:19.77417+00 1 < 0.1%
 
2016-01-11 02:45:08.422225+00 1 < 0.1%
 
2016-02-26 11:02:40.707129+00 1 < 0.1%
 
2016-02-25 02:00:10.44408+00 1 < 0.1%
 
2016-02-17 20:17:26.904174+00 1 < 0.1%
 
2016-02-23 06:28:47.4043+00 1 < 0.1%
 
Other values (888218) 888218 > 99.9%
 

Length

Max length29
Mean length28.88901273
Min length22
ValueCountFrequency (%) 
Decimal_Number 10 66.7%
 
Other_Punctuation 2 13.3%
 
Space_Separator 1 6.7%
 
Math_Symbol 1 6.7%
 
Dash_Punctuation 1 6.7%
 
ValueCountFrequency (%) 
Common 15 100.0%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

date_last_seen
Categorical

HIGH CARDINALITY
TYPE DATE
Distinct count839005
Unique (%)94.5%
Missing0
Missing (%)0.0%
Memory size6.8 MiB
2016-01-27 20:40:15.46361+00
 
49224
2016-07-03 18:15:27.873481+00
 
1
2017-03-16 01:12:36.938189+00
 
1
2016-07-03 17:43:07.532685+00
 
1
2016-07-03 17:25:41.771938+00
 
1
Other values (839000)
839000
ValueCountFrequency (%) 
2016-01-27 20:40:15.46361+00 49224 5.5%
 
2016-07-03 18:15:27.873481+00 1 < 0.1%
 
2017-03-16 01:12:36.938189+00 1 < 0.1%
 
2016-07-03 17:43:07.532685+00 1 < 0.1%
 
2016-07-03 17:25:41.771938+00 1 < 0.1%
 
2017-01-27 09:14:33.037769+00 1 < 0.1%
 
2016-02-10 20:36:29.294882+00 1 < 0.1%
 
2015-12-16 19:46:54.735655+00 1 < 0.1%
 
2016-07-03 18:20:36.835577+00 1 < 0.1%
 
2016-02-11 08:54:42.891795+00 1 < 0.1%
 
Other values (838995) 838995 94.5%
 

Length

Max length29
Mean length28.83996789
Min length22
ValueCountFrequency (%) 
Decimal_Number 10 66.7%
 
Other_Punctuation 2 13.3%
 
Space_Separator 1 6.7%
 
Math_Symbol 1 6.7%
 
Dash_Punctuation 1 6.7%
 
ValueCountFrequency (%) 
Common 15 100.0%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

price_eur
Real number (ℝ≥0)

SKEWED
Distinct count112039
Unique (%)12.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3062208.4001452764
Minimum0.04
Maximum2706149053064.4
Zeros0
Zeros (%)0.0%
Memory size6.8 MiB

Quantile statistics

Minimum0.04
5-th percentile1073.28
Q11295.34
median7327.91
Q316281.79
95-th percentile34848.916
Maximum2.706149053e+12
Range2.706149053e+12
Interquartile range (IQR)14986.45

Descriptive statistics

Standard deviation2871372340
Coefficient of variation (CV)937.6802507
Kurtosis888227.745
Mean3062208.4
Median Absolute Deviation (MAD)6100290.459
Skewness942.4582842
Sum2.719939243e+12
Variance8.244779116e+18
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[4.00000000e-02 4.50000000e-02 6.50000000e-02 7.50000000e-02 1.15000000e-01 ... 2.69401098e+07 2.69503514e+07 2.87679968e+07 1.28878612e+08 2.70614905e+12], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1295.34 168938 19.0%
 
9900 1643 0.2%
 
10900 1616 0.2%
 
11900 1574 0.2%
 
12900 1546 0.2%
 
8900 1489 0.2%
 
13900 1411 0.2%
 
14900 1402 0.2%
 
3500 1400 0.2%
 
6900 1398 0.2%
 
Other values (112029) 705811 79.5%
 
ValueCountFrequency (%) 
0.04 788 0.1%
 
0.05 29 < 0.1%
 
0.06 41 < 0.1%
 
0.07 117 < 0.1%
 
0.08 16 < 0.1%
 
ValueCountFrequency (%) 
2.706149053e+12 1 < 0.1%
 
971219350 1 < 0.1%
 
157668401.9 1 < 0.1%
 
100088822.1 1 < 0.1%
 
100003700 1 < 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

df_indexmakermodelmileagemanufacture_yearengine_displacementengine_powerbody_typecolor_slugstk_yeartransmissiondoor_countseat_countfuel_typedate_createddate_last_seenprice_eur
02282567fordfiesta129000.02009.0NaN88.0otherNaNNaNmanNaNNaNNaN2016-03-01 04:22:44.466271+002016-07-03 17:16:27.50003+006900.00
1845119seatleonNaN2004.04245.0NaNcompactNaNNoneNaNNoneNonegasoline2015-12-22 06:54:54.986737+002016-01-07 10:08:16.993797+003886.01
2645106smartfortwo99985.02004.0698.037.0NaNNaNNoneauto22gasoline2015-12-12 17:23:52.693186+002016-02-10 20:13:12.406986+002000.22
32468530fordfocus212020.02002.01753.066.0otherNaNNaNman55NaN2016-03-04 22:05:51.891484+002016-07-03 17:34:54.394008+001600.00
43478350volkswagengolf180000.02003.01900.074.0otherNaNNaNmanNaNNaNNaN2017-02-28 18:34:33.090724+002017-03-06 01:07:20.215485+001295.34
51808645mercedes-benzNaN188000.02002.0NaN105.0otherNaNNaNman45NaN2016-02-19 02:06:25.780919+002016-07-03 18:31:35.520083+002199.81
61135859fiatgrande-punto85613.02007.01248.055.0NaNblackNoneman35diesel2016-01-09 12:31:01.980036+002016-07-03 17:02:41.352873+003800.00
7589975NaNNaN36500.02013.01560.084.0NaNNaNNoneman45diesel2015-12-10 08:46:37.466293+002016-01-24 20:50:30.634275+0012176.46
83522373volvos40187192.02004.02435.0125.0sedanblueNaNNaN45gasoline2017-03-10 16:11:27.006883+002017-03-10 16:11:27.006883+001295.34
92167119renaultmegane116000.02011.01461.081.0otherNaN2018auto55NaN2016-02-27 02:17:25.21999+002016-07-03 19:27:06.221456+008950.00

Last rows

df_indexmakermodelmileagemanufacture_yearengine_displacementengine_powerbody_typecolor_slugstk_yeartransmissiondoor_countseat_countfuel_typedate_createddate_last_seenprice_eur
8882183419461opelastra176000.0NaNNaNNaNotherNaN2019manNaNNaNNaN2017-02-15 18:46:20.92016+002017-02-18 01:53:02.101921+001295.34
8882192382921mercedes-benzNaN269000.02012.01796.080.0otherNaNNaNauto45NaN2016-03-03 08:30:42.61805+002016-07-03 17:26:15.120508+009007.66
8882203153466audi200268000.02003.0NaNNaNotherNaNNaNNaNNaNNaNelectric2016-12-08 18:03:25.632454+002017-02-07 06:00:24.639179+001295.34
8882212631653skodafabia13254.02012.01197.063.0otherNaNNaNman55NaN2016-03-08 18:44:41.233901+002016-07-03 17:51:25.907082+0011450.00
8882221950514renaultclio97000.02011.01461.055.0otherNaNNaNman55NaN2016-02-22 18:01:15.122266+002016-07-03 19:11:52.179141+007500.00
8882231280030NaNNaN168000.02006.0NaNNaNvanNaNNonemanNoneNonediesel2016-01-17 03:56:11.289348+002016-01-20 13:18:33.010844+004490.00
8882241818013mercedes-benzNaN108000.01991.01997.090.0otherNaNNaNman45NaN2016-02-19 07:02:39.933688+002016-07-03 18:26:21.561088+002991.12
8882252482226fordNaN2.02015.03198.0147.0otherNaNNaNauto45NaN2016-03-05 03:43:50.498735+002016-07-03 17:36:26.70881+0028140.53
888226290990bmwx165000.02010.01995.0150.0NaNNaNNoneman45diesel2015-12-02 03:15:58.497633+002015-12-14 04:46:05.466569+0019902.22
8882272146943maseratighibli17800.02015.02987.0202.0otherNaNNaNauto45NaN2016-02-26 17:11:52.695165+002016-07-03 19:26:17.572593+0055900.00